Semi-Supervised Acquisition of a Spanish Lexicon from a Portuguese Seed Lexicon
نویسندگان
چکیده
This paper deals with the automated acquisition of a Spanish medical subword lexicon from an already existing Portuguese seed lexicon. Using two nonparallel monolingual corpora we determine Spanish lexeme candidates from Portuguese seed lexicon entries by heuristic cognate mapping. We are still working on the experiments and trying to achieve a good method for validating the translation hypothesis.
منابع مشابه
Cognate Mapping - A Heuristic Strategy for the Semi-Supervised Acquisition of a Spanish Lexicon from a Portuguese Seed Lexicon
We deal with the automated acquisition of a Spanish medical subword lexicon from an already existing Portuguese seed lexicon. Using two non-parallel monolingual corpora we determined Spanish lexeme candidates from Portuguese seed lexicon entries by heuristic cognate mapping. We validated the emergent lexical translation hypotheses by determining the similarity of fixed-window context vectors on...
متن کاملAutomatic Lexicon Acquisition for a Medical Cross-Language Information Retrieval System
We present a method for the automated acquisition of a multilingual medical lexicon (for Spanish and Swedish) to be used within the framework of a medical cross-language text retrieval system. We incorporate seed lexicons and parallel corpora derived from the UMLS Metathesaurus. The seed lexicons for Spanish and Swedish are automatically generated from (previously manually constructed) Portugue...
متن کاملMorpho-syntactic Lexicon Generation Using Graph-based Semi-supervised Learning
Morpho-syntactic lexicons provide information about the morphological and syntactic roles of words in a language. Such lexicons are not available for all languages and even when available, their coverage can be limited. We present a graph-based semi-supervised learning method that uses the morphological, syntactic and semantic relations between words to automatically construct wide coverage lex...
متن کاملSemi-supervised learning of morphological paradigms and lexicons
We present a semi-supervised approach to the problem of paradigm induction from inflection tables. Our system extracts generalizations from inflection tables, representing the resulting paradigms in an abstract form. The process is intended to be language-independent, and to provide human-readable generalizations of paradigms. The tools we provide can be used by linguists for the rapid creation...
متن کاملSenseval-3: The Spanish lexical sample task
In this paper we describe the Spanish Lexical Sample task. This task was initially devised for evaluating the role of unlabeled examples in supervised and semi-supervised learning systems for WSD and it was coordinated with five other lexical sample tasks (Basque, Catalan, English, Italian, and Rumanian) in order to share part of the target words. Firstly, we describe the methodology followed t...
متن کامل